Search CORE

60 research outputs found

Active Learning with Multiple Views

Author: Knoblock C. A.
Minton S.
Muslea I.
Publication venue: 'AI Access Foundation'
Publication date: 05/10/2011
Field of study

Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multi-view domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we introduce Co-Testing, which is the first approach to multi-view active learning. Second, we extend the multi-view learning framework by also exploiting weak views, which are adequate only for learning a concept that is more general/specific than the target concept. Finally, we empirically show that Co-Testing outperforms existing active learners on a variety of real world domains such as wrapper induction, Web page classification, advertisement removal, and discourse tree parsing

arXiv.org e-Print Archive

Crossref

Self-supervised automated wrapper generation for weblog data extraction

Author: A. Laender
B. Adelberg
C. Kohlschütter
I. Muslea
N. Kushmerick
P. Geibel
R. Baumgartner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives

Crossref

UCL Discovery

Warwick Research Archives Portal Repository

Learning multiple views with orthogonal denoising autoencoders

Author: A Oliva
CM Bishop
G Hinton
I Muslea
J Bergstra
J de Weijer van
M Gönen
S Dasgupta
S Sun
S Sun
S Yu
W Liu
Y Bengio
Y Bengio
YA LeCun
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 06/01/2016
Field of study

Multi-view learning techniques are necessary when data is described by multiple distinct feature sets because single-view learning algorithms tend to overt on these high-dimensional data. Prior successful approaches followed either consensus or complementary principles. Recent work has focused on learning both the shared and private latent spaces of views in order to take advantage of both principles. However, these methods can not ensure that the latent spaces are strictly independent through encouraging the orthogonality in their objective functions. Also little work has explored representation learning techniques for multiview learning. In this paper, we use the denoising autoencoder to learn shared and private latent spaces, with orthogonal constraints | disconnecting every private latent space from the remaining views. Instead of computationally expensive optimization, we adapt the backpropagation algorithm to train our model

Crossref

Irish Universities

DCU Online Research Access Service

Automatic wrappers for large scale web extraction

Author: Anton T.
Crescenzi V.
Freitag D.
Freitag D.
Kushmerick N.
Madhavan J.
Muslea I.
Sahuguet A.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

An active approach to spoken language processing

Author: Abney S.
Argamon-Engelson S.
Bacchiani M.
Baldridge J.
Blum A.
Digalakis V.
Dilek Hakkani-Tür
Eskin E.
Federico M.
Ghani R.
Giuseppe Riccardi
Godfrey J. J.
Gokhan Tur
Gretter R.
Hakkani-Tür D.
Hakkani-Tür D.
Hakkani-Tür D.
Hendrickx I.
Iyer R.
Kearns M.
Kuo J.
Leggetter C.
Leggetter M.
Lewis D. D.
Liere R.
Marcus M. P.
McCallum A. K.
Murata M.
Muslea I.
Natarajan P.
Nigam K.
Price P. J.
Riccardi G.
Riccardi G.
Riccardi G.
Sassano M.
Schapire R. E.
Schohn G.
Seung H. S.
Tang M.
Thompson C.
Tur G.
Tur G.
Tur G.
Tur G.
Tur G.
van Halteren H.
Zavaliagkos G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach

Author: C Hsu
I Muslea
N Kushmerick
S Soderland
T Goan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Crossref

Wrapper Maintenance for Web-Data Extraction Based on Pages Features

Author: C A Knoblock
I. Muslea
Lerman Kristina
N. Kushmerick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

Bottom-Up Learning of Logic Programs for Information Extraction from Hypertext Documents

Author: I. Muslea
J. Lloyd
J.R. Quinlan
N. Kushmerick
R. Reiter
Publication venue
Publication date: 01/01/2003
Field of study

We present an inductive logic programming bottom-up learning algorithm (BFOIL) for synthesizing logic programs for multi-slot information extraction from hypertext documents. BFOIL learns from positive examples only. Furthermore we introduce a logical and relational based representation for hypertext documents (TDOM). We briefly discuss several BFOIL refinements and show very promising results of our system LIPX in comparison to state of the art IE systems

CiteSeerX

Crossref

Selective Sampling Based on Dynamic Certainty Propagation for Image Retrieval

Author: A.W.M. Smeulders
I. Muslea
J. Cheng
M.S. Lew
Y. Rui
Z.H. Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

An Investigation on Genetic Algorithms for Generic STRIPS Planning

Author: D. Goldberg
F. Silva
H. Kautz
H. Kautz
I. Muslea
T. Bylander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref